Conceptual feature generation for textual information using a conceptual network constructed from Wikipedia

نویسندگان

  • Amir Hossein Jadidinejad
  • Fariborz Mahmoudi
  • M. R. Meybodi
چکیده

A proper semantic representation of textual information underlies many natural language processing tasks. In this paper, a novel semantic annotator is presented to generate conceptual features for text documents. A comprehensive conceptual network is automatically constructed with the aid of Wikipedia which has been represented as a Markov chain. Furthermore, semantic annotator gets a fragment of natural language text and initiates a random walk to generate conceptual features which represent topical semantic of the input text. The generated conceptual features are applicable to many natural language processing tasks where the input is textual information and the output is a decision based on its context. Consequently, the effectiveness of the generated features is evaluated in the task of document clustering and classification. Empirical results demonstrate that representing text using conceptual features and considering the relations between concepts can significantly improve not only the bag of words representation but also other state-of-the-art approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards constructing an Integrative, Multi-Level Model for Cognition: The Function of Semantic Networks

Integrated approaches try to connect different constructs in different theories and reinterpret them using a common conceptual framework. In this research, using the concept of processing levels, an integrated, three-level model of the cognitive systems has been proposed and evaluated. Processing levels are divided into three categories of Feature-Oriented, Semantic and Conceptual Level based o...

متن کامل

Requirements for Third Generation University: A Conceptual Review of Iranian Studies

Introduction: For more than a decade, Iranian researchers have been concerned about the third- generation university and have conducted various studies in this field. Collecting and analyzing the ideas presented in these studies may pave the way for finding a path for transformation into third-generation university. This study tries to introduce the requirements of moving Iranian universities t...

متن کامل

An automatic approach for ontology-based feature extraction from heterogeneous textualresources

Data mining algorithms such as data classification or clustering methods exploit features of entities to characterise, group or classify them according to their resemblance. In the past, many feature extraction methods focused on the analysis of numerical or categorical properties. In recent years, motivated by the success of the Information Society and the WWW, which has made available enormou...

متن کامل

Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents

In this paper, we propose a novel method for conceptual hierarchical clustering of documents using knowledge extracted from Wikipedia. The proposed method overcomes the classic bag-of-words models disadvantages through the exploitation of Wikipedia textual content and link structure. A robust and compact document representation is built in real-time using the Wikipedia application programmer’s ...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Systems

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2016